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^ bind Congo red in SHigelU, 

rich in A and T. The direction o 'JS^^S^\ VlU ^ d J^' determined and analyzed. It was 
cartridge. An open reading ton! ^5£k fa fZ^Ztt^i^^^ 0 '^^^^^ 
kilodaltons, ail corresponding to th™ p„ d cted frS^h. f ° Und - ^ Pr ° telns ' 30 ' 27 > ■*« 21 

containing the wrF locus. Si£ STL^™' Pr0duCed to minicells 

plasmid. proteins were expressed only weakly in minicells with the 230-kilobase 



JiT?" n y andmulti P licat '°n of shigellae in colonic 
epithelial cells are the essential attributes in the early staee 
leading to bacdlary dysentery (11). The genetic determinants 

am£%£?fT* a rltil° C f ted ° n 3 large ^O-megadalton 
(MDa) (230-kilobase [kb]) plasmid (22, 23) as well as on the 
chromosome m S flexneri (22). The ability of shigellae to 
™ -ri° n8 ? red has been '""Plicated in their virulence (12, 
fwSSw 0 ^ ^ described as p " (Pigmentation of 
Congo red). In Shigella flexneri 2a strain YSH6000, at least 
t K fJ tn r l ! c ' oci ,p n *e 230-kb plasmid have been shown to 
be essential for the Pcr + phenotype (26). They are located 
more than 70 kb apart from each other (26). Apparently in 
paradox to these observations, a region called v/rF of about 
fii a » fe een cloned in Escherichia coli K-12 by selecting 
Si? P l r P nenot yPe. Subsequently, WrFhas been found 
essential but not sufficient both for the Pcr + phenotype and 
for virulence in S. flexneri (19). 

In the present study, the DNA sequence of virF and its 
SS12i aVe determined > and their characteristics are 

MATERIALS AND METHODS 

Isolation of plasmid DNA. Plasmid DNA was isolated as 
described previously (19). 

Restriction endonuclease analysis and gel electrophoresis. 
Ihe methods for restriction endonuclease analysis and gel 
electrophoresis have been described previously (19, 25) 
were fi^H ■ ° f pM ™ H6509 made by cleaving with \ Bamili 

nt* • Wlth K,enow Po'ymerase at 16°C for 2 h 
r»MA J l f at Tv, a . nd transformation. The cleaved vector 
T4i£- k * u A t0 be c,oned we f e mixed and ligated with 
mu ifcflJ ,ncubatln « at 14 °C for 12 h. The DNA molecules 
25.55 * \were transformed into E. coli MC1061 (2) by the 
M^m o^^T 80 " (16) or tra ^ferred to JM101 (13) or 
M2124(28)by the method of Curtiss (4). 
DM a f Stquence ^termination by the M13 dideoxy method. 
S2 «^? men «f 8 ° f 'i 1 - >" >Flocus w ere cloned into M13mp8 
(14) and transformed into JM101. Their DNA sequence was 
determined by the method of Sanger et al. (20, 21). 
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;„o M ^£T nt ° f MICS ° f cW °™npnenicol. Plasmid-carry- 
mg MC1061 was grown overnight at 30°C in LN broth (24) 

UtTXZnfr^fl^ mL O-twentieth mi- 
lter of 100-fold dilutions of these cultures was inoculated 
into Penassay broth (Difco Laboratories, Detroit, Mich) 
con taming twofold serial dilutions of chloramphenicol. The 

r„ ri fir 61 " 6 T Cad after I 4 " 0f standin S incubation at both 37 
and 30 C and expressed as the MIC. 

s£ > lfZ am r t Since the expec ted molecular 

size of the v,rF product was similar to that of the beta- 
lactamase gene of pBR322, it was necessary to replace the 
ampicilhn resistance gene of pBR322 with another drug 
resistance gene First, a BgtU. linker was inserted into the 
Hincll site of the ampicilhn resistance gene, and the Bgtll- 
EcoRl fragment was replaced by the BamHl-EcoRI frag- 
ment coding for trimethoprim resistance from R388 (29) The 
HmdlU-Saa fragment within the tetracycline resistance 
gene was removed, and the sticky ends were filled in by 
treatment with Klenow polymerase and then ligated with T4 
hgase to yield vector pMY6004. - 

Construction of M2124, containing the 230-kb plasmid The 
- W*" PMYSH6000 and a^pSon- 
thermosensitive denvative of R388::Tn5 was conjugally 
transferred into M2124. Since S. flexneri containing the 
comtegrate proved to be fully virulent and Pcr + (19) tlfe 
Z PCF+ determinan ts of the plasmid had not been 
Tnt Zt t dur, ng eomtegrate formation. The transconju- 
Sve o of a re P«cation-thermosensitive deriv- 
ative of R388.:Tn5 by growth in drug-free LN broth at 42°C 
A derivative of M2124 which was kanamycin sensitive and 
trimethoprim sensitive subsequently confirmed by molecu- 
lar analysis and its Pcr + phenotype to be pMYSH6OOO::ISJ0, 
w&s selected* ' 

Analysis of plasmid-coded proteins. Minicells from 250-ml 
stationary-phase cultures of strain M2124 containing 

Strothfff YSI ! 65 K 13 WCre ^ at ^cSmS 
Hinton broth (Difco Laboratories, Detroit, Mich.) contain- 

'^L tnmeth< «> rin »- M2124 containing pMYSH 

6000::IS5<? was grown at 37°C in LN broth. MiniceUs were 
™^7°r / th 1 e , method of And *s et al. (1), preincubated at 
JO or 5 J C tor 1 h in methionine assay-minimal glucose salts 
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FIG. 1. Deletion map of the virF locus. (A) Deletions derived from pMYSH6501; (B) deletions derived from pMYSH6502 (19). , 

230-kb plasmid fragment; — r, pBR322 vector; cq, deleted regions; CSS, regions containing virF termini. The arrow shows the reading 
direction of the virF locus. There is a linker-derived BamHI site at the end of each deletion. Previous restriction sites for the enzymes shown 
which now cannot be cut again because of manipulation of these sites are shown in parentheses. 



medium (9), labeled with [ 35 S]methionine at 30 or 37°C for 4 
h, collected by centrifugation, and incubated in LN broth 
containing 1.5% caseine hydrolysate at 30 or 37°C for 10 min. 
The collected minicells were lysed by the method of Andres 
et al. (1) and analyzed by electrophoresis in a 14% polyacryl- 
amide gel containing sodium dodecyl sulfate. 

DNA-DNA hybridization. The method used for DNA-DNA 
hybridization has been described previously (19). In brief, 
filters were hybridized at 42°C with a hybridization mixture 
containing 0.6 M NaOl, 0.2 M Tris hydrochloride (pH 7.9), 
0.02 M EDTA, 0.5% sodium dodecyl sulfate, and 60% 
formamide. 

RESULTS 

DNA sequence analysis of virF. The minimum virF region 
has been determined by making Battl deletions from the 
BamHI site on the pBR322 part of pMYSH6501 and 
PMYSH6502 (19) (Fig. 1) and by determining their Per 
phenotype. The deletion ends of all these mutants had been 
linked to a BamHI linker in the previous study (19). There- 
fore, various parts of the virF locus could be cleaved by 
double digestion with EcoRl for the site on the pBR322 
vector and BamHI for the site on the linker at the deletion 
end. These fragments were recloned to an M13mp8 vector. 
The deletion ends of the mutants chosen were separated by 



50 to 250 bp and derived from two plasmjds with the virF 
locus in opposite orientations, as shown in Fig. 1. It was 
therefore rather easy to read the sequences bidirectionally 
and duplicately by determining 200- to 300-bp sequences 
from each M13 clone. The sequence listed in Fig. 2 is shown 
in the 5' to 3' direction to correspond to the direction of the 
arrow in Fig. 1. 

Direction of transcription of virF. A chloramphenicoKe-: 
sistance cartridge (3) containing the structural gene coding, 
for chloramphenicol acetyltransferase but without the pro- 
moter was inserted into the Bglil site of pMYSH6509 (Fig. 
1). The cartridge was inserted from the left to the right in 
pMYSH6514 with respect to the direction shown in Fig. 1 
and in the opposite direction in pMYSH6515! When trans- 
formed into E. coli K-12 strain MC1061, the MIC of chlor- 
amphenicol for the strain carrying pMYSH6514 was 100 
Hg/ml at 37°C and 50 ^g/ml at 30°C, whereas the MIC for the 
strain carrying pMYSH6515 was 6.25 pig/ml at both 37 and 
30°C. This was in spite of the possible readthrough in the 
latter strain from the promoter for tetracycline resistance on 
the vector located upstream from the cartridge. 

Analysis of protein products of virF* A cloned large frag- 
ment containing virF had been shown to produce a ca. 
20-kilodalton (kDa) protein (19). Its molecular size was 
determined to be 21 kDa by more precise gel analysis 
(described below). Furthermore, the cloned virF region had 
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GCAAATACTT AGCTTGTTGC ACAGAGAAAT AGAAGCTGCA TAAGCT CTTT 
CTTCAAAAAA TGTAAAt'aM GTTAAATA^A GGAAAAATTA Ct'taATCTAT 



CTTAATAACG GAAAGATTTT GTATACAATC ACTGTTACAC AAATTTCTTA 
GTTACTCTGT AAACACTAAA TATAGTTTGG TTATTCTGTT GAATTT^VTcA 
TGGATATGGG ACATAAAAAC AAAATAGATA TAAAGGTTCG^CTTGCATAA^' ' 



TATATTATTT TATATGCAAA AAGGTGTTCA ATGACGGTTA GCTCAGGCAA 
TGAAACTTTG ACTATCGATG AAGGGCAAAT TGCTTTTATA GAGCGAAATA 
TACAAATAAA CGTCTCCATA AAAAAATCTG ATAGCATTAA TCCATTTGAG 



ATTATAAGCC TTGACAGAAA TTT ATTATTA AGCATTATTA GAATAATGGA 



ACCAATTTAT TCATTTCAAC ACTCCTATTC TGAGGAGAAA AGGGGGTTAA 

ACAAAAAAAT ATTCCTCCTC TCTGAGGAGG. AGGTTTCTAT CGATTTGTTC 

AAATCTATAA AACAGATGCC TTTCGGCAAA AGAAAGATCT ATAGTTTAGC 

TTGCCTTTTA TCAGCTGTTT CTGATGAGGA AGCTTTATAT ACTTCGATAT 

CGATAGCTTC TTCTCTTAGT TTTTCTGATC AGATAAGGAA GATTGTTGAA 

AAAAACATCG AGAAGAGATG GCGTCTTTCT GATATTTCAA ATAACTTGAA 

TTTATCAGAA ATAGCTGTTA GAAAACGATT GGAGAGTGAA AAATTAACAT 

TTCAACAAAT CCTTCTTGAT ATTCGCATGC ATCATGCAGC AAAGCTTTTA 

TTGAATAGTC AAAGCTATAT TAATGATGTA TCAAGACTTA TCGGAATATC 

AAGCCCATCT TATTTTATAA GGAAATTTAA TGAATATTAT GGTATAACTC 

CAAAGAAATT TTACTT ATAT CATAAAAAAT TTTaXaTGCT TCAfTAGCCCA 
^ j 



JXGCTATTGC CAGATGGGTT TTCC 



PMYSH6512; Tentative attenuator sequences are underlined wS TdSJX^^" 1 * ° f ^ derivative * PMYSH6511 and 
frame. Probable ribosome binding sites (Shine-Dalga^c .sequences) S^S^T™" ^ * *" ^ 0pen readi «* 
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FIG. 3. Construction of pMYSH6513. Ap, Coding region for ampicillin resistance from pBR322; Tc, coding region for tetracycline 
resistance from pBR322; Tp, coding region for trimethoprim resistance from R388. the virF fragment was derived from pMYSH6509. Open 
boxes, Sites derived from Linkers. 
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TABLE 1. Codon usage in virF and in E. coli 



-30K 
-27K 



-21 K 



FIG. 4. Protein analysis by the minicell method. Lanes 1 and 6, 
pMYSH6OOO::IS50; lanes 2 and 4, pMY6004; lanes 3 and 5, 
pMYSH6513. Lanes 1, 2, and 3, growth at 30°C; lanes 4, 5, and 6, 
growth at 37°C. Numbers at the sides indicate molecular sizes (in 
kilodaltons). 



been complemented in trans both for Pcr + and for virulence 
when it coexisted in an S. flexneri 2a YSH6000 derivative 
with a small deletion in the Sail fragment F of the 230-kb 
plasmid (19). To analyze the product more accurately, a 
fragment with the BamUl site of the linker at one end and the 
Haelll site at the other was cleaved from pMYSH6509 (Fig. 
1 and 3), the BamUX site was filled in, and then both ends 
weite^inked to an EcoRI linker. This small virF fragment 
with EcoRI sites at both termini was linked to the EcoRI site 
of a pBR322-derived trimethoprim resistance vector, 
PMY6004, to obtain pMYSH6513 (Fig. 3). A minicell analy- 
sis with pMYSH6513 and its vector, pMY6004, as a control 
revealed the production of proteins of about 30 and 27-kDa 
at both 30 and 37°C in addition to one of 21 kDa identified 
previously (19) (Fig. 4). These three proteins were expressed 
only weakly on the 230-kb plasmid. 

Interpretation of DNA sequence of virF. Since the 5' end of 
the DNA sequence shown in Fig. 2 is the deletion end of 
PMYSH6509, and the 3' end corresponds to the deletion end 
of pMYSH6510, this sequence is considered the minimum 
required for the Pcr + phenotype in £. coli K-12. One of the 
striking characteristics of the sequence is the richness in A 
and T. The GC content is about 30%. The open reading 
frame found within this sequence codes for 262 amino acid 
residues (boxed with thick lines in Fig. 2), and another codes 
for 61 amino acid residues in the opposite direction (boxed 
with thin lines in Fig. 2). Since the deletion mutations 
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■ Data from. Post and Nomura (17). 
* Data from Greene et al. (7). 
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[met||met1 asp [met| gly his lys asn lys ile asp ile lys val arg leu his asn tyr ile 

ile leu tyr ala lys arg cys ser |met| thr val ser ser gly asn glu thr leu thr ile 

asp glu gly gln ile ala phe ile glu arg asn ile gln ile asn val ser ile lys lys 

ser asp ser ile asn pro phe glu ile ile ser leu asp arg asn leu leu leu ser ile 

ile arg ile [met) glu pro ile tyr ser phe gln his ser tyr ser glu glu lys arg gly 

leu asn lys lys ile phe leu leu ser glu glu glu val ser ile asp leu phe lys ser 

ile lys glu [met] pro phe gly lys arg lys ile tyr ser leu ala cys leu leu ser ala 

val ser asp glu glu ala leu tyr thr ser ile ser ile ala ser ser leu ser phe ser 

asp gln ile arg lys ile val glu lys asn ile glu lys arg trp arg leu ser asp ile 

ser asn asn leu asn leu ser glu ile ala val arg lys arg leu glu ser glu lys leu 

thr phe gln gln ile leu leu asp ile arg |met| his his ala ala lys leu leu leu asn 

ser gln ser tyr ile asn asp val ser arg leu ile gly ile ser ser pro ser tyr phe 

ile arg lys phe asn glu tyr tyr gly ile thr pro lys lys phe tyr leu tyr his lys 

LYS PHE 

FIG. 5. Amino acid sequence predicted from the large open reading frame. Within the large open reading frame, seven methionine codons 
are found (boxed). 



extending inside the vertical broken lines at 78 base pairs 
(bp) (corresponding to the deletion end of pMYSH6511) and 
993 bp (corresponding to the deletion end of pMYSH6512) 
gave rise to Per", the sequences between 1 and 78 and that 
between 993 and 1024 bp are essential for Pcr + . Within the 
former sequence, the -35 region and -10 sequence 
(Pribnow box) can be seen (solid underline) (18), but no clear 
ribosome binding site (27) was found. Within the latter 
sequence there was a typical terminator (18), consisting of 
GC-rich inverted repeats (waved underline) with four con- 
tinuous Ts in the 3' direction. No promoterlike sequence 
was found for the opposite reading frame for 61 amino acid 
residues in the latter sequence. This is consistent with the 
evidence obtained by the chloramphenicol cartridge. The 
amino acid sequence coded by the larger open reading frame 
is shown in Fig. 5. Since the nucleotide sequence is rich in A 
and T, the frequencies of codon usage are also strikingly 
characteristic comparedmth those of E. coli (7, 17), (Table 



DISCUSSION 

The data presented in this study are compatible with the 
interpretation that the DNA sequence shown in Fig. 2 
(boxed with thick solid lines) is transcribed from the left to 
the right. A -35 region and a -10 sequence (Pribnow box) 
were found. However, no clear ribosome binding site was 
found. It should be pointed out in this connection that either 
one sequence or the other preceding the first, fourth, and 
fifth ATG codons may be the ribosome binding site. Corre- 
sponding to these initiation codons, proteins of about 30, 27, 
and 21 kDa may be produced. Their molecular sizes corre- 
spond exactly to those produced by the minicells. 

During a search for the ribosome binding site, we found a 
characteristic sequence (shown by broken underlines in Fig. 
2). It is an attenuator sequence (10) consisting of GC-rich 



inverted repeats with three continuous T's in the 3' direc- 
tion. The downstream strand of the repeats is a typical 
ribosome binding site (27). These structures may regulate the 
production of these proteins. If this ribosome binding site is 
really functional, a 16-kDa protein would be produced. No 
such protein was found in the miniceil analysis (Fig. 4). This 
is presumably because the ribosome binding site is blocked 
due to the stem-loop structure. 

For the possible proteins translated from the first, second, 
and third ATG codons, a signal peptide-like amino acid 
sequence (15) was seen at their N terminus (amino acids 1 to 
24). If the signal peptide is cleaved during passage through 
the inner membrane, the resulting protein may become about 
27 kDa. 

Hale et al. (8) reported that proteins of 20, 25, 38, 43, 53, 
62, and 78 kDa were associated with virulence in Shigella 
spp. and enteroinvasive Escherichia coli and that they were 
in a repressed state in organisms grown at 30°C. At this 
moment it cannot be decided which of the 30-, 27-, and 
21-kDa proteins we found is really responsible for the Pcr + 
phenotype. We also cannot decide the correlation between 
the proteins found by Hale et al. (8) and by us. 

The secondary structure of the protein coded by this large 
open reading frame was analyzed by the method of Gamier 
(5) with a computer program. Figure 6 shows the location of 
the predicted alpha-helix, beta-sheet, and beta-turn confor- 
mations. This protein is highly characteristic in its rich 
beta-sheets. Its helical content was estimated to be 36%. It 
has been shown that Congo red staining of the amyloid-fibril 
protein deposited in amyloidosis is dependent on the beta- 
pleated sheet configuration (6). If the virF product is really 
rich in beta-sheet, a similar direct interaction between Congo 
red and the virF protein may be responsible for the Per 
phenotype. 

Finally, Southern hybridization analysis with a probe 
made by cleaving pMYSH6509 with BamHl and Haelll has 
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yJ^L^^f te**xmduy structure of the large open reading frame. Alpha-helix, beta-sheet, and beta-turn are shown by wavy 
lines, jagged lines, and end brackets, respectively. Numbers show amino acid residues from the N terminus. 



revealed that the large virulence plasmids of S. flexneri 2a, 
2b, 3a, 3b, 3c, 4, 5, and 6, S. dysenteriae, S. boydii, S. 
sonnei, and enteroinvasive E. coli have a similar sequence, 
although weak nonspecific hybridization was encountered, 
presumably due to the AT richness (data not shown). These 
observations indicate that the v/rF protein is required for the 
invasion process of all these pathogens. 
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